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ABSTRACT 

A thermodynamic framework is presented to character- 
ize the evolution of efficiency, order, and quality in so- 
cial content production systems, and this framework is 
applied to the analysis of Wikipedia. Contributing ed- 
itors are characterized by their (creative) energy levels 
in terms of number of edits. We develop a definition of 
entropy that can be used to analyze the efficiency of the 
system as a whole, and relate it to the evolution of power- 
law distributions and a metric of quality. The concept 
is applied to the analysis of eight years of Wikipedia 
editing data and results show that (1) Wikipedia has 
become more efficient during its evolution and (2) the 
entropy-based efficiency metric has high correlation with 
observed readership of Wikipedia pages. 



I. INTRODUCTION 

A social production system (Benkler & Nissenbaum 
2006) is characterized by a process in which the cre- 
ative energies of large number of people contribute to 
large projects, mainly without traditional, centralized, 
hierarchical organizational mechanisms. One question 
that may be asked is whether such systems collectively 
adapt to more efficiently and eff'ectively harness the cre- 
ative energies of contributors and produce higher quality 
outputs. In this paper, we present a thermodynamic 
framework that allows us to characterize the efficiency 
of social production systems in several ways. We an- 
alyze the evolution of efficiency in Wikipedia and show 
how this relates to content quality in terms of readership 
demand. 

In this paper, we consider the social production of con- 
tent in Wikipedia as an open thermodynamic system. 
Each editor corresponds to a particle and the number of 
edits associates with a level of particle energy. We then 
exploit the concept of entropy in statistical mechanics to 
understand the underlying principles for efficiency in so- 
cial production systems. We apply these insights to ana- 
lyze the evolution of efficiency and quality in Wikipedia. 

We also explain the observed power-law distribution of 
editing activity as a consequence of how edits relate 
to "energy" devoted to Wikipedia in a thermodynamic 
setting. At an aggregate level, the power-law distribu- 
tion of user activity on peer- pro duct ion websites could 
arise through two mechanisms. First, the observed di- 
versity of behavior could refiect an extreme heterogene- 
ity of preference among the potential user population 
(Wikipedia editors are "born") (Panciera, Halfaker & 
Terveen 2009). Second, the behavior could be due to 



diversity of experience of new users after they start par- 
ticipating on the site, e.g., positive or negative feed- 
back from other users on their contributions (Wikipedia 
editors are "made") (Wilkinson 2008) (Ren, Kraut & 
Kiesler 2007). In contrast to these studies, we consider a 
third possibility, namely that the diversity arises from a 
relatively short-term decrease in effort required to make 
additional contributions with experience of editing. This 
corresponds to the general improvement people have 
with cognitive tasks (Newell & Rosenbloom 1981)(Pirolli 
& Anderson 1985)(Shrager, Hogg & Huberman 1988). 
Specifically, we propose that the power-law behaviors at 
the level of contributions arise largely due to the decreas- 
ing effort required for a given user to make additional ed- 
its in a relatively short period of time (e.g., one month) 
or to a particular page. This leads to a logarithmic en- 
ergy model for edits. 

With each number of edits v we associate "energy" 
log{v). The base of the logarithm is arbitrary, and we use 
the natural logarithm, which is the common convention 
in statistical mechanics. This logarithmic dependence 
means, for instance, it is easier for someone to make their 
10th edit, after making 9 already, than it is to make their 
2nd edit. This is reasonable — people gain experience 
with the subject matter of the page so presumably can 
contribute an edit with less time. The fact that people 
who edit a lot return more often in monthly statistics of 
Wikipedia supports the logarithmic energy assumption. 
Observations that user activity rates are lognormally dis- 
tributed (Hogg & Szabo 2009) (Hogg & Lerman 2010), 
and multiplicative processes result in power-law distri- 
butions (Mitzenmacher 2004) provide further evidence. 

We put social production into the context of statisti- 
cal mechanics perspective, by defining the notions of en- 
tropy, energy, temperature and free energy, and deriving 
their relationships in power-law distributions. With the 
Wikipedia editing statistics, we show that entropy effi- 
ciency (entropy per energy) and entropy reduction (rel- 
ative entropy w.r.t. its maximum) are good metrics for 
quantifying efficiency and quality of the collaboration. 

The rest of this paper is organized as follows. Section II 
defines the entropy related metrics and discusses their re- 
lationship with power-law distribution. Section HI looks 
at monthly editing data in eight years of Wikipedia and 
illustrates the evolution of entropy-based metrics. Sec- 
tion IV analyzes page- wise entropy metrics and their re- 
lationship to quality of production. With a set of read- 
ership data, we show high correlation between entropy 
efficiency and readership for Wikipedia pages. 
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II. METRICS FOR ORDER AND EFFICIENCY 

We model social collaboration as an open thermody- 
namic system consisting of a set of particles, each hold- 
ing a certain level of energy. We define a set of metrics 
analogous to thermodynamic quantities: energy, entropy 
and temperature, to measure the order and efficiency of 
such a system. We show that under a logarithmic en- 
ergy level, a system would self-organize with power-law 
distributions under thermodynamic principles. 

Entropy, Energy and Temperature 

Let / be a set of individuals, e.g., a set of particles or 
editors, and V be a set of positive values that an indi- 
vidual can hold, e.g., energy of a particle or the number 
of edits. A collection is a mapping I ^ V. For the 
Wikipedia editing system, / is a set of editors and V is 
a set of positive integers, and Vi is the number of edits 
editor i contributed. Here a collection can be as large 
as the whole collaborative community, say, the whole 
Wikipedia, or as small as a sub-community, say, a page. 

Let Sy be the number of individuals in / with value v in 
V, and N = \I\ be the total number of individuals in the 
collection, we define entropy for such a collection as: 



S = - ^Py \og{py) where py 



Sy 

N' 



(1) 



In this definition, if all individuals have the same value, 
S is minimized to be 0, If, on the other hand, all indi- 
viduals have different values, S would be maximized to 
\og{N). The collection is in high order if the entropy is 
low, or contributions are even among individuals, and in 
disorder if the entropy is high, or there is divergent con- 
tributions among individuals. A complex and effective 
system would be in a state between order and disorder 
(Mitcheh, Crutchfield & Hraber 1993). 

The entropy S can also be defined for a collection with 
infinite number of individuals; it has a physical mean- 
ing and is related to energy and temperature in thermo- 
dynamics. When a thermodynamic system has a large 
number of independent particles, according to the Boltz- 
mann distribution (Ma 1985), 



Pu oce 



(2) 



where Pu is the probability of a particle at a given state 
with energy level k is Boltzmann constant, and T is 
temperature. The Boltzmann distribution ensures that 
the particles have exponential distribution with respect 
to the energy levels, i.e., high energy particles in a given 
state are much less likely than lower energy particles. 
Entropy S for this distribution becomes: 



S = - ^Pu log(p^ 



E 
kf 



log(Z) 



(3) 



where E = ^y^PuU is the average energy per particle, 
Z = ^e-w (4) 



is the partition function, and S corresponds to thermo- 
dynamic entropy per particle. Note that Eq. [3] can be 
rewritten as 



and 



E = kTS -kT\og{Z) 



A = E-kTS = -kT\og{Z) 



(5) 



(6) 

is called free energy (Goodstein 1975). Free energy is 
an important concept in thermodynamics, which is the 
amount of useful energy that can transfer to work. 

Now assuming that the energy for the number of ed- 
its is proportional to its logarithmic value (as argued 
in Section I), i.e., u = log('u), and that the Boltzmann 
distribution holds for logical particles, we have: 



_ log(t^) 



Py (X e Py (X V where a 



1 



(7) 



which is a power-law distributions in values in V, with 
power-law co-efficient In other words, a corresponds 
to the inverse of temperature. The higher the a, the 
lower the temperature. 

Power-law distributions have been discovered in many 
social media including Wikipedia (Wilkinson 2008). 
From thermodynamic principles, we argue that they 
come naturally because the energy level is logarithmic 
in terms of the levels of the activities. 



Entropy Reduction and Entropy Efficiency 

Entropy measures disorder or the amount of uncertainty 
in the system. In contrast to entropy, entropy reduction 
(i.e., effective entropy (Tononi 2008)) is a relative en- 
tropy with respect to the maximum entropy given the 
number of individuals in the collection: 



R = \og{N) - S 



(8) 



where N is the total number of individuals and S is 
the entropy defined in Eq. [l] is maximum when S 
is minimum — S measures disorder; R measures order, 
and in particular, the amount of entropy reduction due 
to order. If the collection is from a random uniform 
distribution, S will be close to \og{N) and R close to 0. 

Let E be the average energy defined in Eq. [3) we have en- 
tropy efficiency as average entropy per unit energy level: 



^ E 

Note that according to Eq. |3) 

5^1 log(Z) 
^ E kT E 



log(Z) 



(9) 



(10) 



The following theorem claims that power-law distribu- 
tions maximize entropy efficiency for logarithmic energy 
levels. 

Theorem 1. Power-law distributions maximize en- 
tropy efficiency Q (Eq. ^ when E = ^^Pv ^og{v). 
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Proof. The proof is from (Mitzenmacher 2004). To 
maximize Eq. |9) let the derivative of Q with respect to 
Py to be 0. The derivative of Q with respect to is, 



dpv 



dE 
dpv 



S)/E^ 



i.e., performing derivative of S w.r.t. py in Eq. [T] and 
derivative of logarithmic energy E w.r.t. p^, we have 

[-{\og{py)^l)E-\og{v)S]/E\ 

When it is 0, we have 

S 



log(p^) + 1 = - \og(v)— \og(v 



E 



(11) 



□ 



with a = ^ = Q, i.e., py (xv 

Theorem [l] indicates not only that the power-law maxi- 
mizes the entropy efficiency Q, the power-law coefficient 
is approximated to it. 

Note that entropy efficiency can also be defined for linear 
energy levels with E = Xl^p^v. In this case, maximizing 
Q (Eq. ITT]) gives 



\Og{Py) + 1 = 



E 



-av 



(12) 



i.e., py oc e~"^, an exponential distribution as the Boltz- 
mann distribution. Therefore, entropy efficiency is max- 
imized when the corresponding distribution matches the 
energy levels: power-law for logarithmic and exponen- 
tial for linear. Analogous to entropy maximization in 
the second law of thermodynamics for closed systems, 
entropy efficiency is another general principle that a sys- 
tem would use the minimum energy to produce the same 
amount of entropy. 

The relationship between entropy efficiency Q (shown in 
Eq. 10) and free energy A (shown in Eq. [6| is obtained 
by combining these two equations: 



Q _ E-A 
a ~ E 

We call ^ the free energy reduction ratio. 



(13) 



Entropy Related Properties in Power-law 

Power-law distributions are dominant in the Wikipedia 
editing statistics where their properties can be fully cap- 
tured by the power-law coefficient a. In the rest of this 
section, we show how the change of a affects entropy 
entropy efficiency Q and entropy reduction i?, as well as 
energy per particle E and free energy per particle A. 

Figure [T] shows the change of entropy, entropy effi- 
ciency and entropy reduction, for two sample sizes, 1000 
and 10^, with a varying power-law coefficient a, where 
dashed lines are for entropy 5, entropy efficiency Q or 
entropy reduction R in the uniform distribution with the 
corresponding number of samples. We observe that: 
• Both entropy and entropy reduction are growing with 
the number of samples. However, entropy efficiency is 
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Figure 1. Entropy, entropy efficiency and entropy reduc- 
tion vs. power-law coefficients, in two sample sizes: 10^ 
and 10^. 



independent of the number of samples. Entropy effi- 
ciency grows almost linearly with the power-law coef- 
ficient. 

• Uniform distributions have maximum entropy, mini- 
mum entropy efficiency and minimum entropy reduc- 
tion. Clearly, random uniform distributions do not 
have order or efficiency. 

• Entropy decreases with and entropy reduction in- 
creases with the power-law coefficient. However, the 
rates of increase and decrease reduce with the increase 
of a, and both entropy and entropy reduction are sat- 
urated when a approach 2.5. 

For the Wikipedia editing statistics, the minimum value 
is one edit. For power-law distributions with minimum 
value 1, the relationship between the average energy and 
the power-law coefficient becomes: 



E 



1 



1 



(14) 



where E is the average energy per particle. The proof of 
this relationship is in the Appendix. This implies that 
(1) a>l (or /cT < 1) , and (2) a ^ 1, ^ ^ oo. Also if 
the minimum v to be 1, we have 



C(«) 



(15) 



where CO is zeta function. Therefore the free energy (Eq. 
[6| for a power-law distribution with coefficient a is 



-kT\og{Z) 



log(Z) log(C(a)) 



a 



(16) 
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Figure 2. Energy and free energy with a varying power- 
law coefficient a. 



Figure [2] shows the relationship between energy (Eq. [T4| 
and free energy (Eq. [l6| with a varying power-law co- 
efficient a. We see that when a increases, or temperature 
decreases, average energy per particle decreases and free 
energy increases, both saturated when a approches 4, or 
kT around 0.25. 

From these observations, we come to an explanation why 
in the real world, the power-law coefficient will lie mostly 
within 1 < a < 3. When a > 3, both free energy and 
entropy reduction are saturated, meaning it is not very 
useful for a to be larger or to have lower temperature. 

Power-law distributions show fractal structures at mul- 
tiple levels, in particular, if particles are grouped using 
their logarithmic scales, the resulted distribution is still 
power-law. Let be a class of units whose values are 
between b'^ to 6"^+^, where 6 > is an arbitrary base. 
The total number of particles for class n is N{n) that is 
proportional to: 



/ 



6"+^ L-n(a-l) 

y-o^dv = (1-6-^^-^)) (X (17) 

a — 1 



and the total amount of values from class n is C (n) that 
is proportional to: 



L 



6"+' U-nia-2) 

a-2 ^ ' 

(18) 

Note that when a < 2, C{n) increases exponentially with 
n, i.e., more portions of energy come from high energy 
particles and when a > 2, C(n) decreases exponentially 
with n, i.e., more portions of energy come from low en- 
ergy class particles. When a = 2, C(n) is a constant, 
i.e., all classes contribute evenly. 



In conclusion, when ot increases, entropy decreases, en- 
tropy reduction and entropy efficiency increase, and 
overall contributions are shifted from high energy par- 
ticles to low energy particles. 



III. IS WIKIPEDIA BECOMING MORE EFFICIENT? 

We analyzed the Wikipedia editing data from January 
2002 to December 2009. Prior research has shown that 
the growth of Wikipedia follows Logistic or Gomperz 
(Suh, Convertino, Chi & Pirolli 2009) curves and power- 
law distributions are everywhere (Wilkinson 2008). We 
would like to find out what properties other than volume 
growth have changed during the evolution of Wikipedia. 
In particular, we would like to answer the question of 
the degree to which Wikipedia becomes more efficient, 
from thermodynamic principles. 

We consider Wikipedia as an open system with active ed- 
itors in each month as particles, and their total number 
of edits of the month as values. The fact that the dis- 
tributions of edits are power-law suggests that energy is 
logarithmic in terms of the number of edits. The system 
is open in the sense that there are editors joining and 
dropping from month to month. The sum of the loga- 
rithmic contributions from active editors (with minimum 
one edit) is the total energy of the system for the month. 
The number of active editors ranges from 1000 in early 
months to 600,000 in later months. We will overview 
the evolution of entropy, entropy reduction, and entropy 
efficiency of the Wikipedia's monthly editing activities. 

Figure [3]^ a) shows entropy efficiency and power-law co- 
efficient over the 96 months of the evolution. As we 
observe, power-law coefficient has grown from 1.5 to 
2.0 steadily over the months and entropy efficiency has 
grown in almost the same rate. Figure [sj^b) shows 
the evolution of entropy and entropy reduction over 96 
months of the history. Here the decreasing of entropy 
and increasing of entropy reduction suggest the increas- 
ing order in the editing system. Figure [sjc) shows the 
change of energy per editor, the growth of free energy 
and the free energy reduction ratio. The ratio has been 
growing but saturated almost 20 months (at 40 months) 
before the saturation of the active editors (at 60 months). 
This seems to suggest that the saturation of the free en- 
ergy reduction ratio maybe the cause of the saturation 
of the number of editors. 

Another interesting question is - how does the editor 
structure evolve? We classify editors according to their 
levels of energy, or logarithm of their edits, i.e., 1-10 
edits as class 1, 11-100 edits as class 2, 101-1000 edits as 
class 3, etc. As we discussed in Section II, the number 
of editors in class n is proportional to lO"^^*^"-'^^ (Eq. 
17), and the total edits from class n is proportional to 
-|^Q-n(Q!-2) Since a has been increasing over the 

months, contributions are shifted from higher classes to 
lower classes, and now relatively even from all classes 
since a approaches 2 (from Eq. 18). It confirms that 
Wikipedia is becoming a media for the masses in later 
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Figure 4. The total number of active editors from differ- 
ent classes over 96 months. 
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Figure 5. Total number of edits from each class over 96 
months. 
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Figure 3. (a) Evolution of entropy efficiency and power- 
law coefficient in Wikipedia. (b) Evolution of entropy and 
entropy reduction in Wikipedia over 96 months. The solid 
line indicates the maximum entropy, log(A^), where is 
the number of editors of the month, (c) Evolution of en- 
ergy and free energy per editor, and free energy reduction 
ratio. 



months, rather than for ehtes in early months. Figure [4] 
and Figure |5] show the total number of editors in each 
class and total contributions from each class, respec- 
tively, during 96 months of evolution. As we observe, 
except for high level classes which are noisy, the logarith- 
mic volume of each class is proportional to the class index 
(Figure [4| and overall contributions from each class are 
getting relatively even (Figure [5|, a result from increas- 
ing power-law coefficient in power-law distributions. 

Entropy-based metrics support previous findings about 
collaborative editing and bring new perspectives. In 
(Wilkinson 2008), the author reported that for a web 
media, the higher power-law coefficient a, the higher en- 
try barrier it has for an author to contribute a new edit. 
For example, it is easier to contribute a new edit to Digg 
(low a), whereas it is harder to contribute a new edit to 
Wikipedia (high a). We see that in Wikipedia, a has 
been growing steadily (Figure [s]^ a)). Due to increasing 
order and efficiency, it is gettingnarder to add more edits 
for an editor. 

In conclusion, we believe Wikipedia has become more 
efficient in terms of entropy efficiency, and more ordered 
according to entropy reduction. The increasing power- 
law coefficient causes the shift of the contributions from 
elites to crowd. The saturation of free energy reduction 
ratio may cause the saturation of the active editors. 
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IV. DOES EFFICIENCY IMPLY QUALITY? 

We have noticed that Wikipedia as a whole has evolved 
to be more efficient. We now like to see if those measure- 
ments are applicable to identifying the quality of pages. 
Based on this analysis, we examine (1) how the efficiency 
of a page correlates to its readership or quality, and (2) 
what is the major factor that separates high and low 
quality pages? 

Data Setting 

We choose the pages with at least 4500 edits and that are 
saturated as of December 2009. By saturation we mean 
that a page has gained less than 5% edits in the latest 
10% of time since its creation. This selection ensures 
that each page being analyzed has gained a stable edit- 
ing structure that is less noisy. Accordingly, there are 
a total of 962 pages being analyzed, and entropy-based 
measurements for each page are computed. 

We have shown that power-law distributions maximize 
entropy efficiency for logarithmic energy levels. To 
distinguish power-law pages from non-power-law pages^ 
we use the Komogrov-Smirnov statistic D (Casella & 
Berger 2001), which is the maximum difference of the 
fitted and the empirical c.d.f.'s. Empirically, we found 
that D = 0.1 provides a good separation of power-law 
pages from those that are not. In total, there are 906 
power-law pages and 56 non-power-law pages. 

Efficiency v.s. Readership 

One of the main questions to answer is whether the or- 
der or efficiency of a page, measured by entropy-based 
metrics, corresponds to its quality. Here we look at a 
page's readership, the number of clicks it gets during 
some time interval. We use the readership data of one 
week in February 2009, but we noticed that using data of 
other points of times within half of the year also obtain 
similar behaviors. In Figures [6]^ a-d), we summarize how 
readership relates to (a) entropy, (b) entropy reduction, 
(c) entropy efficiency, and (d) total energy for each page, 
in which non-power-law pages are marked as red. The 
correlation coefficients (p) between readership and these 
metrics are also shown on the figures. 

For the power-law pages (black circles), high readership 
associates with low entropy_(Figure|6ja), p = -0.67), high 
entropy reduction (Figure [6[b), p = 0.51), and high en- 
tropy efficiency (Figure |6[c), p = 0.70) — all suggesting 
that higher order and efficiency associate with higher 
readership. This is remarkable because there is no con- 
nection between readership and any of these metrics from 
how they are computed. One intuitive explanation is: as 
the editing structure become more ordered and efficient, 
the quality of the produced content improves, and thus 
the page draws more readers. 

Note also for power-law pages, content quantity (in this 
case, the total energy) does not affect readership much, 
since there is only very small correlation {p = 0.37) 
between total energy and readership from Figure l6F 



Therefore we claim that quantity does not imply quality, 
but efficiency does. 

On the other hand, all non-power-law pages have low 
readership: the maximum readership of non-power-law 
pages is 1111, whereas the median of all pages and power- 
low pages are 11163 and 11964, respectively. In ad- 
dition, there is a positive correlation between entropy 
and readership (instead of negative for power-law cases) 
and negative correlation between entropy reduction and 
readership (instead of positive). The total energy of 
non-power-law pages are significantly lower. There is, 
however, positive correlations between total energy and 
readership for non-power-law cases {p = 0.60). The 
most interesting fact is: entropy efficiency, however, is 
correlated positively with readership in both power-law 
{p = 0.70) and non-power-law {p = 0.45) cases. 

In addition to total energy, we have also analyzed the 
correlation between total number of edits with reader- 
ship (Figure [t]). Note that there is almost no correlation 
between these two, for both power-law {p = 0.05) and 
non-power-law cases {p = 0.12). In addition, although 
non-power-law pages are low in readership, they in fact 
have more edits than the power-law ones in terms of 
mean (9425.9 > 6537.4) and median (5532.5 > 4677.5). 
The total energy, however, correctly separates power-law 
from non-power-law pages, i.e., low energy ones are not 
power-law. This further reinforces our choice of using 
logarithmic energy in thermodynamic equations in this 
context. 
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Figure 7. Total edits vs. readership and number of edi- 
tors, where non-power-law pages are marked as red. 



Efficiency vs. Editor Base 

In this section we like to show what actually correlates 
the most with efficiency and readership. The answer 
is, surprisingly, the number of editors. Figures [6|e-h) 
summarize how the same metrics in Figures [61 a-d) corre- 
late to a page's number of editors, where the correlation 
coefficients are shown on the figures. First of all, for 
all non-power-law pages, the number of editors is small 
(< 100). In contrast to power-law pages, the number 
of editors is positively correlated to entropy {p = 0.66) 
and slightly negatively correlated to entropy reduction 
{p = —0.25). However, what the most interesting fact is: 
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Figure 6. Top: The Readership of pages versus (a) Entropy, (b) Entropy Efficiency, (c) Entropy Reduction, and (d) 
Total Energy. Bottom: The number of editors of pages versus (e) Entropy (f) Entropy Efficiency (g) Entropy Reduction 
and (h) Total Energy. Black circles indicate power-law pages and red no n- power- law. 



the number of editors is highly positively correlated with 
entropy efficiency in both power-law (p = 0.74) and non- 
power-law {p = 0.90) cases. And, not too surprisingly, 
but still interestingly, the total energy is highly positively 
correlated with the number of editors, in both power low 
{p = 0.87) and non-power-law {p = 0.89) cases. The sim- 
ple fact is that the more editors the more efficiency, and 
the more efficiency the better quality, and these three 
metrics maybe positively reinforce each other. 

From Figures [6]^e-h), we see that the separation between 
clusters become clearer for all metrics. Roughly 100 ed- 
itors seems to be the boundary between power-law and 
non-power-law pages: before this boundary (non-power- 
law pages), the increase on editors results in lower order, 
characterized by increased entropy and decreased en- 
tropy reduction. After this boundary (power-law pages), 
more editors results in higher order, i.e., lower entropy {p 
= -0.64) and higher entropy reduction {p = 0.87). This 
suggests that the system may have a phase transition 
at the boundary - growing from order to disorder and 
then from disorder to order. In both cases, however, the 
entropy efficiency and total energy grow with the number 
of editors. 

Another interesting distinction introduced by the tran- 
sition is elitism versus wisdom of the crowd. From Fig- 
ure [tI (b) , we see that non-power-law pages show a form 
of elitism, characterized by relatively few elite editors 




n \ \ \ r 

1.5 2.5 3.5 

a 



Figure 8. The power-law slope a of pages versus entropy 
efficiency, where non-power- law pages are marked as red. 



single-handedly contribute an un-proportional amount 
of edits. In contrast, the power-law pages show a form 
of crowd wisdom, characterized by much more editors 
coming up with comparable, or slightly smaller number 
of edits. Since Figure [6] already shows that power-law 
pages tend to have more readership than non-power-law 
ones, it implies that the nature of Wikipedia is a true 
media of the masses, where pages produced by crowd 
wisdom will have higher quality and thus more reader- 
ship compared to that produced by a few elites. 

We have shown that high power-law coefficient (or low 
temperature) implies high entropy efficiency. It is in- 
teresting that it is also true for non-power-law cases in 
our data. Figure [S] provides the scatterplot between each 
page's a and its entropy efficiency. 
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In conclusion, we claim that (1) there are positive corre- 
lations and reinforcement among the number of editors, 
the efficiency of edit distributions among editors, and 
the readership of pages, (2) although the total energy of 
a page does not correlates with the quality /readership of 
the page, it clearly identifies the group of bad pages (i.e., 
low energy ones), and (3) there is a phase transition in 
the entropy measurements with the number of editors. 

V. CONCLUSION 

We have studied the efficiency of social collaborative be- 
haviors through thermodynamic principles; in particular, 
we discovered (1) editors' energy levels correspond to 
the logarithmic of their number of edits, and the power- 
law of edit distributions arises naturally from thermo- 
dynamic principles, and (2) while entropy or entropy 
reduction characterizes order, maximizing entropy effi- 
ciency is one of the basic thermodynamic principles. By 
applying these measurements to the Wikipedia dataset, 
we see that (1) Wikipedia is becoming more efficient, (2) 
entropy efficiency is correlated with the quality of the 
social collaboration, and (3) there is a suggestive phase 
transition separated by a particular number of editors, 
the system may self-organize into efficient and ordered 
states if the threshold is passed. Note that although we 
have used "number of edits" as the source of contribu- 
tions, such analysis is also applicable to other metrics, 
e.g., length of contributions from editors. 

In the future work, we like to understand what causes 
the phase transition by developing both microscopic and 
macroscopic evolutionary models of editing behaviors. 
Such models may give insights and predictions for the 
success and failure of social collaborations. 
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APPENDIX 

Theorem 2. Given a collection {vi\i = 1..N} satisfy- 
ing power-law distribution with power-law coefficient a, 
let average energy be E = ^sv^^^ ^ jj fj^^ minimum 
value of V is 1, E = 

Proof. According to (Newman 2005), we have ol = 
1 -h V. , Since t'min = 1, we have = 1 + 

Ef=ilog(:;7^) 



